20 research outputs found

    Topological inference in graphs and images

    Get PDF

    Topological image modification for object detection and topological image processing of skin lesions

    Get PDF
    We propose a new method based on Topological Data Analysis (TDA) consisting of Topological Image Modification (TIM) and Topological Image Processing (TIP) for object detection. Through this newly introduced method, we artificially destruct irrelevant objects, and construct new objects with known topological properties in irrelevant regions of an image. This ensures that we are able to identify the important objects in relevant regions of the image. We do this by means of persistent homology, which allows us to simultaneously select appropriate thresholds, as well as the objects corresponding to these thresholds, and separate them from the noisy background of an image. This leads to a new image, processed in a completely unsupervised manner, from which one may more efficiently extract important objects. We demonstrate the usefulness of this proposed method for topological image processing through a case-study of unsupervised segmentation of the ISIC 2018 skin lesion images. Code for this project is available on https://bitbucket.org/ghentdatascience/topimgprocess

    Graph approximations to geodesics on metric graphs

    Get PDF
    In machine learning, high-dimensional point clouds are often assumed to be sampled from a topological space of which the intrinsic dimension is significantly lower than the representation dimension. Proximity graphs, such as the Rips graph or kNN graph, are often used as an intermediate representation to learn or visualize topological and geometrical properties of this space. The key idea behind this approach is that distances on the graph preserve the geodesic distances on the unknown space well, and as such, can be used to infer local and global geometric patterns of this space. Prior results provide us with conditions under which these distances are well-preserved for geodesically convex, smooth, compact manifolds. Yet, proximity graphs are ideal representations for a much broader class of spaces, such as metric graphs, i.e., graphs embedded in the Euclidean space. It turns out—as proven in this paper—that these existing conditions cannot be straightforwardly adapted to these spaces. In this work, we provide novel, flexible, and insightful characteristics and results for topological pattern recognition of metric graphs to bridge this gap

    Mining topological structure in graphs through forest representations

    Get PDF
    We consider the problem of inferring simplified topological substructures—which we term backbones—in metric and non-metric graphs. Intuitively, these are subgraphs with ‘few’ nodes, multifurcations, and cycles, that model the topology of the original graph well. We present a multistep procedure for inferring these backbones. First, we encode local (geometric) information of each vertex in the original graph by means of the boundary coefficient (BC) to identify ‘core’ nodes in the graph. Next, we construct a forest representation of the graph, termed an f-pine, that connects every node of the graph to a local ‘core’ node. The final backbone is then inferred from the f-pine through CLOF (Constrained Leaves Optimal subForest), a novel graph optimization problem we introduce in this paper. On a theoretical level, we show that CLOF is NP-hard for general graphs. However, we prove that CLOF can be efficiently solved for forest graphs, a surprising fact given that CLOF induces a nontrivial monotone submodular set function maximization problem on tree graphs. This result is the basis of our method for mining backbones in graphs through forest representation. We qualitatively and quantitatively confirm the applicability, effectiveness, and scalability of our method for discovering backbones in a variety of graph-structured data, such as social networks, earthquake locations scattered across the Earth, and high-dimensional cell trajectory dat

    Enabling planetary science across light-years. Ariel Definition Study Report

    Get PDF
    Ariel, the Atmospheric Remote-sensing Infrared Exoplanet Large-survey, was adopted as the fourth medium-class mission in ESA's Cosmic Vision programme to be launched in 2029. During its 4-year mission, Ariel will study what exoplanets are made of, how they formed and how they evolve, by surveying a diverse sample of about 1000 extrasolar planets, simultaneously in visible and infrared wavelengths. It is the first mission dedicated to measuring the chemical composition and thermal structures of hundreds of transiting exoplanets, enabling planetary science far beyond the boundaries of the Solar System. The payload consists of an off-axis Cassegrain telescope (primary mirror 1100 mm x 730 mm ellipse) and two separate instruments (FGS and AIRS) covering simultaneously 0.5-7.8 micron spectral range. The satellite is best placed into an L2 orbit to maximise the thermal stability and the field of regard. The payload module is passively cooled via a series of V-Groove radiators; the detectors for the AIRS are the only items that require active cooling via an active Ne JT cooler. The Ariel payload is developed by a consortium of more than 50 institutes from 16 ESA countries, which include the UK, France, Italy, Belgium, Poland, Spain, Austria, Denmark, Ireland, Portugal, Czech Republic, Hungary, the Netherlands, Sweden, Norway, Estonia, and a NASA contribution

    The curse revisited : when are distances informative for the ground truth in noisy high-dimensional data?

    No full text
    Distances between data points are widely used in machine learning applications. Yet, when corrupted by noise, these distances-and thus the models based upon them-may lose their usefulness in high dimensions. Indeed, the small marginal effects of the noise may then accumulate quickly, shifting empirical closest and furthest neighbors away from the ground truth. In this paper, we exactly characterize such effects in noisy high-dimensional data using an asymptotic probabilistic expression. Previously, it has been argued that neighborhood queries become meaningless and unstable when distance concentration occurs, which means that there is a poor relative discrimination between the furthest and closest neighbors in the data. However, we conclude that this is not necessarily the case when we decompose the data in a ground truth-which we aim to recover-and noise component. More specifically, we derive that under particular conditions, empirical neighborhood relations affected by noise are still likely to be truthful even when distance concentration occurs. We also include thorough empirical verification of our results, as well as interesting experiments in which our derived 'phase shift' where neighbors become random or not turns out to be identical to the phase shift where common dimensionality reduction methods perform poorly or well for recovering low-dimensional reconstructions of high-dimensional data with dense noise

    Stable topological signatures for metric trees through graph approximations

    Get PDF
    The rising field of Topological Data Analysis (TDA) provides a new approach to learning from data through persistence diagrams, which are topological signatures that quantify topological properties of data in a comparable manner. For point clouds, these diagrams are often derived from the Vietoris-Rips filtration—based on the metric equipped on the data—which allows one to deduce topological patterns such as components and cycles of the underlying space. In metric trees these diagrams often fail to capture other crucial topological properties, such as the present leaves and multifurcations. Prior methods and results for persistent homology attempting to overcome this issue mainly target Rips graphs, which are often unfavorable in case of non-uniform density across our point cloud. We therefore introduce a new theoretical foundation for learning a wider variety of topological patterns through any given graph. Given particular powerful functions defining persistence diagrams to summarize topological patterns, including the normalized centrality or eccentricity, we prove a new stability result, explicitly bounding the bottleneck distance between the true and empirical diagrams for metric trees. This bound is tight if the metric distortion obtained through the graph and its maximal edge-weight are small. Through a case study of gene expression data, we demonstrate that our newly introduced diagrams provide novel quality measures and insights into cell trajectory inference.ISSN:0167-8655ISSN:1872-734

    Topologically Regularized Data Embeddings

    No full text
    Unsupervised representation learning methods are widely used for gaining insight into high-dimensional, unstructured, or structured data. In some cases, users may have prior topological knowledge about the data, such as a known cluster structure or the fact that the data is known to lie along a tree- or graph-structured topology. However, generic methods to ensure such structure is salient in the low-dimensional representations are lacking. This negatively impacts the interpretability of low-dimensional embeddings, and plausibly downstream learning tasks. To address this issue, we introduce topological regularization: a generic approach based on algebraic topology to incorporate topological prior knowledge into low-dimensional embeddings. We introduce a class of topological loss functions, and show that jointly optimizing an embedding loss with such a topological loss function as a regularizer yields embeddings that reflect not only local proximities but also the desired topological structure. We include a self-contained overview of the required foundational concepts in algebraic topology, and provide intuitive guidance on how to design topological loss functions for a variety of shapes, such as clusters, cycles, and bifurcations. We empirically evaluate the proposed approach on computational efficiency, robustness, and versatility in combination with linear and non-linear dimensionality reduction and graph embedding methods
    corecore